Bash getops – arguments

Parsing command-line arguments

Most Unix and Linux commands take options preceded by the "minus" symbol, so to list files in long format, ordered (in reverse) by their timestamp, you use: ls -l -r -t, which can also be expressed as ls -lrt.

Some commands also take arguments, so you can create a tar archive of the "myfiles" directory with a name "mytarfile.tar" taken from the -f option with the command: tar -c -f mytarfile.tar myfiles. In the case of tar -c, any words after the options are taken to be the list of file(s) to put in the archive.

The tl;dr lowdown

Okay, here's the "too long - didn't read" quick synopsis. Here's how you use getopts:

while getopts 'srd:f:' c

do

  case $c in

    s) ACTION=SAVE ;;

    r) ACTION=RESTORE ;;

    d) DB_DUMP=$OPTARG ;;

    f) TARBALL=$OPTARG ;;

  esac

done

There is also the external utility getopt, which parses long-form arguments, like "--filename" instead of the briefer "-f" form. You might want to read that post, too.

Right, now that's got the busy people satisfied, we can start to explore what getopts is, how it works, and how it can be useful to your scripts.

Introducing getopts

There is a convenient utility which parses these options for you; it is called getopts, and whilst its usage can feel a little strange, using this technique will allow your scripts to process options in a standardised and familiar-feeling way.

The first argument you pass to getopts is a list of which letters (or numbers, or any other single character) it will accept. Each character, if followed by a colon, is expected to be followed an argument, like the tar -f mytarfile.tar example above. tar -f always has to be followed by the name of a tar file. This option argument is passed to your script in the $OPTARG variable.

getopts will also set the $OPTIND variable for you; we will deal with that later.

The second argument that you pass to getopts is the name of a variable which will be populated with the character of the current switch. Often, this is called opt or just c, although it can have any name you choose.

This example script can save/restore files (as a tarball) and a database. You must pass it either -s (Save) or -r (Restore). If you pass -d databasefile then it will use that name to dump (or restore) the database; if you pass -f tarball, it will use that name for the tarball to create (or extract) the files.

There are a few things this first draft of the script doesn't deal with; passing both -s and -r is invalid. If you do, this script will take the last thing you said, so dbdump.sh -s -r -d dbdump.bin -s -r -s will Save (not Restore) since the last thing it processed was the Save command.

Similarly, if you don't pass at least one of -d and -f, then nothing will happen at all.

The reason for the unset DB_DUMP TARBALL ACTION is that the script does not want to be influenced by any environment variables which may be already set. Note that this will only affect the scope of the running script; the calling shell won't have its variables changed.

For brevity, I have not defined the save_database(), save_files(), restore_database() and restore_files() functions here; the downloadable scripts do have dummy functions so that the scripts will actually run for you. They just display what would be done, but don't actually do anything to your files.

download this script (getopts1.sh)

#!/bin/bash

 

unset DB_DUMP TARBALL ACTION

 

while getopts 'srd:f:' c

do

  case $c in

    s) ACTION=SAVE ;;

    r) ACTION=RESTORE ;;

    d) DB_DUMP=$OPTARG ;;

    f) TARBALL=$OPTARG ;;

  esac

done

 

if [ -n "$DB_DUMP" ]; then

  case $ACTION in

    SAVE)    save_database $DB_DUMP    ;;

    RESTORE) restore_database $DB_DUMP ;;

  esac

fi

 

if [ -n "$TARBALL" ]; then

  case $ACTION in

    SAVE)    save_files $TARBALL    ;;

    RESTORE) restore_files $TARBALL ;;

  esac

fi

The getopts command is an argument to a while loop - each time through the loop, it processes the switch, and sets the $c variable to the character of the switch. You can read more about loops and case in the main tutorial.

If we call this script as: dbdump.sh -s -r -d /tmp/dbdump.bin -f /tmp/files.tar -s, it will process the -s, set $c=s, and we run into the case statement for the first time. This sees that $c=s, sets $ACTION=SAVE, and the ;; at the end of that line tells it to stop processing, and it goes back to getopts for the next run around the while loop. This reads -r, which logically doesn't make sense (we can't have it both save the backup and restore the backup at the same time), but the script doesn't know that, so it sets $c=r, the case statement sets $ACTION=RESTORE, and we go back to getopts to process the next argument.

Now, getopts sets $c=d and also sets $OPTARG=/tmp/dbdump.bin, because the 'd:' in the getopts invocation tells it that -d is followed by an argument (the name of the database dump file). Execution goes on to the case statement, which sets $DBDUMP=/tmp/dbdump.bin. When we get in to the main body of the script, if the $DBDUMP variable has a value, then it will either save the database to that file, or restore it from that file.

The next option is -f /tmp/files.tar, and the same process is followed; getopts sets $c=f and also sets $OPTARG=/tmp/files.tar. The case statement reads these, and sets $TARBALL=/tmp/files.tar.

Finally, we passed it yet another -s switch, so it will change the $ACTION variable back to SAVE.

When the main script starts, it checks if $DB_DUMP is set, then checks the value of $ACTION, and either saves or restores the database, using $DB_DUMP, according to the value of $ACTION.

Similarly, it checks if $TARBALL has been set, and either saves or restores the files with $TARBALL as the argument.

Round Two: Error Checking

This second version of the script uses a couple of useful functions. It is often convenient to have a usage() function, which tells the user the correct way to call the script, and exits with a non-zero error code, to express that it has failed.

It also has a set_variable() function, which can be useful. This script doesn't really need it - the example above just set the variables as it needed to. But since this function does various bits of error checking, it's worth putting all of that into a function and calling it multiple times, rather than repeating the "is this variable already set?" code for every command-line option.

It uses eval and variable indirection. When called as set_variable ACTION SAVE, it sets $varname=ACTION, then checks 'if [ -z "${!varname}" ]'. The exclamation mark before varname tells the shell to replace that with the value of $varname, so the -z test will check whether the $ACTION variable is of zero length. If so, it evaluates $varname=\"$@\", which works out as ACTION=SAVE in this case. It could just use: eval "$varname=$1", but that wouldn't allow for spaces in the filenames, when we come to use this function to set those, too.

The real reason that set_variable() is useful is that if ${!varname} is not zero length, the function will spit out an error message, that $varname is already set, and call the usage() function, which reminds the user of the correct syntax, and exits the whole script.

The next change to this script from the first example, is that it adds -? and -h switches. These are common ways to query a program to find out what the correct usage is. So if the user runs it as: dbdump.sh -? or dbdump.sh -h, it will show them the usage() message, and exit.

Two final checks before the main script gets underway;, the first checks to ensure that $ACTION has been set (if it was called without either -s or -r). If $ACTION is zero length, it calls the usage function.

Then, it checks that at least one of $DB_DUMP and $TARBALL have been set. The logic of this is a little unintuitive: If the -z "$DB_DUMP" test passes, then the && passes execution to the next test, -z "$TARBALL". If that also passes the test, it continues execution via the second && to the usage() function, which will display the message and terminate the script.

If either of those tests failed, then at least one of them is set, the script does have something to do, and the usage() function does not get called, which means that the script is allowed to continue.

Once all that has been done, it calls the save/restore functions as before.

download this script (getopts2.sh)

#!/bin/bash

 

usage()

{

  echo "Usage: $0 [-s|-r] [ -d DB_DUMP ] [ -f TARBALL ]"

  exit 2

}

 

set_variable()

{

  local varname=$1

  shift

  if [ -z "${!varname}" ]; then

    eval "$varname=\"$@\""

  else

    echo "Error: $varname already set"

    usage

  fi

}

 

#########################

# Main script starts here

 

unset DB_DUMP TARBALL ACTION

 

while getopts 'srd:f:?h' c

do

  case $c in

    s) set_variable ACTION SAVE ;;

    r) set_variable ACTION RESTORE ;;

    d) set_variable DB_DUMP $OPTARG ;;

    f) set_variable TARBALL $OPTARG ;;

    h|?) usage ;; esac

done

 

[ -z "$ACTION" ] && usage

[ -z "$DB_DUMP" ] && [ -z "$TARBALL" ] && usage

 

if [ -n "$DB_DUMP" ]; then

  case $ACTION in

    SAVE) save_database $DB_DUMP ;;

    RESTORE) restore_database $DB_DUMP ;;

  esac

fi

 

if [ -n "$TARBALL" ]; then

  case $ACTION in

    SAVE) save_files $TARBALL ;;

    RESTORE) restore_files $TARBALL ;;

  esac

fi

Option Index: $OPTIND

We mentioned above that the other variable that getopts will set for you is the index of where you are up to in processing the options; this is the $OPTIND variable. This is a bit of an odd one; it's the index of the next variable to be processed, so if your script takes arguments: "dbdump.sh -s -d foo -f bar", then as it's processing -s, $OPTIND is 2. When it's processing -d foo, $OPTIND is 4, because the next thing it will process will be the 4th argument ("-f").

The $OPTIND variable is useful when your script processes some switches, followed by further arguments. For example, ls -ltr /tmp/*.txt /tmp/*.png stops processing switches when it finds the first non-option argument (/tmp/*.txt).

This third example processes its command line arguments, and then operates on any files it has been given.

download this script (getopts3.sh)

#!/bin/bash

unset VERBOSE

while getopts 'smv' c

do

  echo "Processing $c : OPTIND is $OPTIND"

  case $c in

    s) ACTION=sha1sum ;;

    m) ACTION=md5sum ;;

    v) VERBOSE=true ;;

  esac

done

 

echo "Out of the getopts loop. OPTIND is now $OPTIND"

shift $((OPTIND-1))

if [ $VERBOSE ]; then

  set -x

fi

$ACTION $@

If you pass it -s, it will check the sha1sum checksum of the files; -m tells it to do a md5sum checksum instead. To help us see how $OPTIND changes, a -v switch enables verbose execution.

After getopts has finished parsing the switches, it exits the loop, with $OPTIND set to 2 (if only one switch was used) or 3 (if two switches were used). Therefore, we need to call shift with $OPTIND-1 to get rid of the first 1 (or 2) arguments from the command line. This leaves $@ with the rest of the command line, which should be a list of files.

Here, I ran the script from this directory, which contains getopts1.sh, getopts2.sh and getopts3.sh, so the *.sh represents these files.

$ ./getopts3.sh -v -m *sh

Processing v : OPTIND is 2

Processing m : OPTIND is 3

Out of the getopts loop. OPTIND is now 3

+ md5sum getopts1.sh getopts2.sh getopts3.sh

e0f015e6f47709c9639589c98886c429  getopts1.sh

ac16051be13728378f8827c0db84c6ae  getopts2.sh

41521e2253685ab252f87ee820b466b7  getopts3.sh

$

So when the script processes the -v switch, $OPTIND=2. It then moves on to -m and $OPTIND=3. So the script calls shift 2 by working out the value of $OPTIND-1. It can now call $ACTION *.sh, which will evaluate as md5sum getopts1.sh getopts2.sh getopts3.sh (or sha1sum getopts1.sh getopts2.sh getopts3.sh, if you ran it with the -s switch.